Search CORE

4 research outputs found

Randomized Strong Recursive Skeletonization: Simultaneous compression and factorization of $\mathcal{H}$ -matrices in the Black-Box Setting

Author: Martinsson Per-Gunnar
Yesypenko Anna
Publication venue
Publication date: 02/11/2023
Field of study

The hierarchical matrix (

\mathcal{H}^{2}

-matrix) formalism provides a way to reinterpret the Fast Multipole Method and related fast summation schemes in linear algebraic terms. The idea is to tessellate a matrix into blocks in such as way that each block is either small or of numerically low rank; this enables the storage of the matrix and the application of it to a vector in linear or close to linear complexity. A key motivation for the reformulation is to extend the range of dense matrices that can be represented. Additionally,

\mathcal{H}^{2}

-matrices in principle also extend the range of operations that can be executed to include matrix inversion and factorization. While such algorithms can be highly efficient for certain specialized formats (such as HBS/HSS matrices based on ``weak admissibility''), inversion algorithms for general

\mathcal{H}^{2}

-matrices tend to be based on nested recursions and recompressions, making them challenging to implement efficiently. An exception is the \textit{strong recursive skeletonization (SRS)} algorithm by Minden, Ho, Damle, and Ying, which involves a simpler algorithmic flow. However, SRS greatly increases the number of blocks of the matrix that need to be stored explicitly, leading to high memory requirements. This manuscript presents the \textit{randomized strong recursive skeletonization (RSRS)} algorithm, which is a reformulation of SRS that incorporates the randomized SVD (RSVD) to simultaneously compress and factorize an

\mathcal{H}^{2}

-matrix. RSRS is a ``black box'' algorithm that interacts with the matrix to be compressed only via its action on vectors; this extends the range of the SRS algorithm (which relied on the ``proxy source'' compression technique) to include dense matrices that arise in sparse direct solvers

arXiv.org e-Print Archive

SlabLU: A Two-Level Sparse Direct Solver for Elliptic PDEs

Author: Martinsson Per-Gunnar
Yesypenko Anna
Publication venue
Publication date: 18/08/2023
Field of study

The paper describes a sparse direct solver for the linear systems that arise from the discretization of an elliptic PDE on a two dimensional domain. The solver is designed to reduce communication costs and perform well on GPUs; it uses a two-level framework, which is easier to implement and optimize than traditional multi-frontal schemes based on hierarchical nested dissection orderings. The scheme decomposes the domain into thin subdomains, or "slabs". Within each slab, a local factorization is executed that exploits the geometry of the local domain. A global factorization is then obtained through the LU factorization of a block-tridiagonal reduced coefficient matrix. The solver has complexity

O(N^{5/3})

for the factorization step, and

O(N^{7/6})

for each solve once the factorization is completed. The solver described is compatible with a range of different local discretizations, and numerical experiments demonstrate its performance for regular discretizations of rectangular and curved geometries. The technique becomes particularly efficient when combined with very high-order convergent multi-domain spectral collocation schemes. With this discretization, a Helmholtz problem on a domain of size

1000 \lambda \times 1000 \lambda

(for which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a high-powered desktop with GPU acceleration

arXiv.org e-Print Archive

Parallel Optimizations for the Hierarchical Poincar\'e-Steklov Scheme (HPS)

Author: Martinsson Per-Gunnar
Yesypenko Anna
Publication venue
Publication date: 27/11/2022
Field of study

Parallel optimizations for the 2D Hierarchical Poincar\'e-Steklov (HPS) discretization scheme are described. HPS is a multi-domain spectral collocation scheme that allows for combining very high order discretizations with direct solvers, making the discretization powerful in resolving highly oscillatory solutions to high accuracy. HPS can be viewed as a domain decomposition scheme where the domains are connected directly through the use of a sparse direct solver. This manuscript describes optimizations of HPS that are simple to implement, and that leverage batched linear algebra on modern hybrid architectures to improve the practical speed of the solver. In particular, the manuscript demonstrates that the traditionally high cost of performing local static condensation for discretizations involving very high local order

p

can be reduced dramatically

arXiv.org e-Print Archive

SkelFMM: A Simplified Fast Multipole Method Based on Recursive Skeletonization

Author: Chen Chao
Martinsson Per-Gunnar
Yesypenko Anna
Publication venue
Publication date: 25/10/2023
Field of study

This work introduces the kernel-independent multi-level algorithm "skelFMM" for evaluating all pairwise interactions between

N

points connected through a kernel such as the fundamental solution of the Laplace or the Helmholtz equations. The method is based on linear algebraic tools such as randomized low rank approximation and "skeleton representations" of far-field interactions. The work is related to previously proposed linear algebraic reformulations of the fast multipole method (FMM), but is distinguished by relying on simpler data structures. In particular, skelFMM does not require an "interaction list", as it relies instead on algebraically-modified kernel interactions between near-neighbors at every level. Like other kernel independent algorithms, it only requires evaluation of the kernel function, allowing the methodology to easily be extended to a range of different kernels in 2D and 3D. The simplicity of the algorithm makes it particularly amenable to parallel implementation on heterogeneous hardware architectures. The performance of the algorithm is demonstrated through numerical experiments conducted on uniform and non-uniform point distributions in 2D and 3D, involving Laplace and (low frequency) Helmholtz kernels. The algorithm relies on a precomputation stage that constructs a tailored representation for a given geometry of points. Once the precomputation has completed, the matrix-vector multiplication attains high speed through GPU acceleration that leverages batched linear algebra

arXiv.org e-Print Archive